FBK-HLT: An Effective System for Paraphrase Identification and Semantic Similarity in Twitter
نویسندگان
چکیده
This paper reports the description and performance of our system, FBK-HLT, participating in the SemEval 2015, Task #1 "Paraphrase and Semantic Similarity in Twitter", for both subtasks. We submitted two runs with different classifiers in combining typical features (lexical similarity, string similarity, word n-grams, etc) with machine translation metrics and edit distance features. We outperform the baseline system and achieve a very competitive result to the best system on the first subtask. Eventually, we are ranked 4th out of 18 teams participating in subtask "Paraphrase Identification".
منابع مشابه
AMRITA_CEN$@$SemEval-2015: Paraphrase Detection for Twitter using Unsupervised Feature Learning with Recursive Autoencoders
We explore using recursive autoencoders for SemEval 2015 Task 1: Paraphrase and Semantic Similarity in Twitter. Our paraphrase detection system makes use of phrase-structure parse tree embeddings that are then provided as input to a conventional supervised classification model. We achieve an F1 score of 0.45 on paraphrase identification and a Pearson correlation of 0.303 on computing semantic s...
متن کاملEbiquity: Paraphrase and Semantic Similarity in Twitter using Skipgrams
We describe the system we developed to participate in SemEval 2015 Task 1, Paraphrase and Semantic Similarity in Twitter. We create similarity vectors from two-skip trigrams of preprocessed tweets and measure their semantic similarity using our UMBC-STS system. We submit two runs. The best result is ranked eleventh out of eighteen teams with F1 score of 0.599.
متن کاملMITRE: Seven Systems for Semantic Similarity in Tweets
This paper describes MITRE’s participation in the Paraphrase and Semantic Similarity in Twitter task (SemEval-2015 Task 1). This effort placed first in Semantic Similarity and second in Paraphrase Identification with scores of Pearson’s r of 61.9%, F1 of 66.7%, and maxF1 of 72.4%. We detail the approaches we explored including mixtures of string matching metrics, alignments using tweet-specific...
متن کاملSemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)
In this shared task, we present evaluations on two related tasks Paraphrase Identification (PI) and Semantic Textual Similarity (SS) systems for the Twitter data. Given a pair of sentences, participants are asked to produce a binary yes/no judgement or a graded score to measure their semantic equivalence. The task features a newly constructed Twitter Paraphrase Corpus that contains 18,762 sente...
متن کاملFBK-HLT: A New Framework for Semantic Textual Similarity
This paper reports the description and performance of our system, FBK-HLT, participating in the SemEval 2015, Task #2 “Semantic Textual Similarity”, English subtask. We submitted three runs with different hypothesis in combining typical features (lexical similarity, string similarity, word n-grams, etc) with syntactic structure features, resulting in different sets of features. The results eval...
متن کامل